End-to-End Deep Learning Framework for Speech Paralinguistics Detection Based on Perception Aware Spectrum

نویسندگان

  • Danwei Cai
  • Zhidong Ni
  • Wenbo Liu
  • Weicheng Cai
  • Gang Li
  • Ming Li
چکیده

In this paper, we propose an end-to-end deep learning framework to detect speech paralinguistics using perception aware spectrum as input. Existing studies show that speech under cold has distinct variations of energy distribution on low frequency components compared with the speech under ‘healthy’ condition. This motivates us to use perception aware spectrum as the input to an end-to-end learning framework with small scale dataset. In this work, we try both Constant Q Transform (CQT) spectrum and Gammatone spectrum in different end-toend deep learning networks, where both spectrums are able to closely mimic the human speech perception and transform it into 2D images. Experimental results show the effectiveness of the proposed perception aware spectrum with end-to-end deep learning approach on Interspeech 2017 Computational Paralinguistics Cold sub-Challenge. The final fusion result of our proposed method is 8% better than that of the provided baseline in terms of UAR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Crafting Adversarial Examples For Speech Paralinguistics Applications

Computational paralinguistic analysis is increasingly being used in a wide range of applications, including securitysensitive applications such as speaker verification, deceptive speech detection, and medical diagnostics. While state-ofthe-art machine learning techniques, such as deep neural networks, can provide robust and accurate speech analysis, they are susceptible to adversarial attacks. ...

متن کامل

Fusion of Multispectral Data Through Illumination-aware Deep Neural Networks for Pedestrian Detection

Multispectral pedestrian detection has received extensive attention in recent years as a promising solution to facilitate robust human target detection for around-the-clock applications (e.g. security surveillance and autonomous driving). In this paper, we demonstrate illumination information encoded in multispectral images can be utilized to significantly boost performance of pedestrian detect...

متن کامل

Group-wise Deep Co-saliency Detection

In this paper, we propose an end-to-end group-wise deep co-saliency detection approach to address the co-salient object discovery problem based on the fully convolutional network (FCN) with group input and group output. The proposed approach captures the group-wise interaction information for group images by learning a semantics-aware image representation based on a convolutional neural network...

متن کامل

On Multi-Domain Training and Adaptation of End-to-End RNN Acoustic Models for Distant Speech Recognition

Recognition of distant (far-field) speech is a challenge for ASR due to mismatch in recording conditions resulting from room reverberation and environment noise. Given the remarkable learning capacity of deep neural networks, there is increasing interest to address this problem by using a large corpus of reverberant far-field speech to train robust models. In this study, we explore how an end-t...

متن کامل

End-to-end Learning from Spectrum Data: A Deep Learning approach for Wireless Signal Identification in Spectrum Monitoring applications

This paper presents end-to-end learning from spectrum data an umbrella term for new sophisticated wireless signal identification approaches in spectrum monitoring applications based on deep neural networks. End-to-end learning allows to (i) automatically learn features directly from simple wireless signal representations, without requiring design of hand-crafted expert features like higher orde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017